can learn.
There's hype in there.
So I will do a little bit of explaining
about what machine learning is if you're not
familiar with that.
Hopefully, I'll explain what it is and what it isn't.
In production, that is slightly a lie.
This talk is about a product that
was supposed to be in production at this point in time
when I committed to doing this several months ago.
We've done a lot of work to get it there,
but take with a slight gradient of salt
in that I won't actually be able to talk about the learnings
of, has this thing scaled up so far?
But we will be using scikit-learn.
I am a data engineer at Simple.
I will tell you a little bit more about what Simple is.
First, a little bit more background on me.
I am Jeff Kalucus.
I came to technology through science.
So I went to grad school at the University
of Wisconsin-Madison and worked on the Large Hadron Collider,
which is really fun.
This is me in the cavern of the CMS experiment in Switzerland,
about 100 meters underground.
So why do I mention this?
Since I now work at a tech company,
I mention it probably because it is cool.
It was a fun one.
It was a pretty fun project to work on.
I also mention it because when you think of machine learning,
you often think of big data.
And this was a project that was definitely big data.
This thing is running.
There are collisions happening every 25 nanoseconds.
Each one is a megabyte.
It ends up being a huge amount of data.
There's all this cool trigger system
to get down to a manageable size.
And you still end up with petabytes of data.
So like petabytes, you have 1,000 gigabytes, a terabyte.
1,000 terabytes is a petabyte.
And you have hundreds of petabytes of data
sitting on a disk somewhere.
And they're doing distributed stuff to process it.
So I got to do that in grad school.
Never once did they do any machine learning in grad school.
So huge data sets, no machine learning.
Why?
Some people did use machine learning in that realm.
But it's kind of something you only resort to if you have to.
Because in research, you want your results
to be interpretable.
You want to be able to explain what's going on.
So I mean, I'm kind of proud of that, that we really did,
I feel like, do a good job of relying on.
If you can make the plot and say,
I cut out everything below this line on this plot,
that is understandable.
And those are the kinds of papers that we were writing.
So lots of data, but making these very interpretable things.
On top of it, with machine learning,
you're kind of handing data over to a black box.
And then you say, what did it do in that black box?
And you can talk about algorithms,
and you can talk about linear algebra.
But there's a lot more context you need to know.
And in the end, there are a lot of details
that are harder to reproduce.
So these are reasons why you might not
want to use machine learning for a problem.
And what I'm talking about today is
going to be machine learning on a very small data
set, which is a couple of gigabytes.
This was big data, no machine learning.
Today, we're talking about machine learning
without big data.
And there are good reasons to do that or not to do that.
And hopefully, you'll leave this with a little bit better
understanding of why you might want to or not use
machine learning for your problem.
So a bit of an overview of the story today.
Talking more about the problem that we're solving.
In our case, it's about classifying text chats.
We'll talk about that question of why did we
use machine learning for this problem?
Why might we not want to use it for this problem?
We will walk through steps of developing the model.
So you'll get to actually see a bunch of Python code
showing you what it looks like to direct with Stack at Learn.
And it turns out that humans love live demos
and seeing things actually happen.
So you humans will be able to experience that rather than
just seeing slides and believing me.
So then we'll talk through some complications of this.
So you've built this kind of toy model.
You've proved that it can work.
And then you actually want to run this thing in production
and do useful things with it.
And what gets complicated about that?
How do you manage that whole lifecycle of creating a model
and using it?
Then lastly, we'll talk about if you think that what we did
in here is garbage, how else might you
decide to do this if you want to do machine learning?
And there are lots of different reasons
why the shape of your organization and your team
and your problem are likely not going to be exactly what
is shown here.
So other approaches might work for you.
So what is the problem that we're solving?
A little bit of context.
This is simple.
It's the whole idea of banking remade
with lovely design, equally lovely tools
to help you stay right inside your account
and genuine human goodness.
I'm going to actually be focusing
on this genuine human goodness part, which largely talks
about our customer relations.
A big piece of what makes it simple,
different from some on the bank, is
we have a fantastic customer relations department, which
all works at our headquarters in Portland, Oregon.
And about half of the company is customer relations.
And they're great people.
And the problem we're talking about today
is about making our lives easier.
So I work for Simple.
I'm remote.
Actually, I live in the Columbus area.
I'm the only one in Ohio.
So I was actually in Portland last week.
And I'm glad to be back.
Categorizing chats.
So our customer relations department,
the primary way that they end up interacting with customers
is through in the app.
You can send a support message or a chat.
And then a customer relations agent
gets to respond to that and be pleasant,
and oftentimes sends you a gift in the response.
That data all makes it into our data warehouse.
So what you're seeing right here is a fictionalized query,
somewhat, to our data warehouse.
Amazon Rediff is what we use there.
It's a database.
You can run a query against it.
And you can look at a bunch of this chat data.
So chats have a subject and a body
when you initially submit them.
And after you submit a chat, there
is somebody in customer relations
who's assigned to looking at the queue and triaging chats,
assigning them categories so that they
can go to different people who are specialized
in different things.
So one of the categories you see here is urgent.
If you lose the card, it's something
that's exposing you to fraud.
It's exposing us to fraud.
We want to deal with that as fast as we possibly can.
So that's part of the reason of triage,
is that hopefully we can find these really important things
quickly and then make sure that somebody is assigned to it.
So these are the categories that you might assign.
Urgent, customer education, a new product, incidents,
other things.
So this is a great situation for machine learning
in that we have this data warehouse.
We have this data set with hundreds of thousands
of chat messages that we can go take a look at.
And we can have a machine come look at this stuff.
And basically what we've got here, subject and body,
these are kind of the question.
And then category is the answer that we want to get out.
So the problem that we want to do
is we want to do the categorization thing
automatically.
We want a machine to handle this so that a human doesn't have to.
And we already have this large corpus,
this large set of data where we have a recorded answer.
A human has already tagged these things.
And we can use this as a training data set
for figuring out if some approach that we have works
or does not work.
So luckily, we don't have to spin up Mechanical Turk
and give our customer information
to a bunch of random people that go on Mechanical Turk.
Just to be a little bit more explicit about what
this looks like, this is in our web app.
This is a case where me, Jeff, actually lost my card
in St. Louis.
And I sent a message.
And this is basically what we're talking about here.
So you might have this whole long chat with an agent.
But it's just this initial contact where I say,
hey, I lost my card.
And the subject on this was lost card.
We want to categorize this thing so we
can figure out who to send it to to get the chat started.
These are some of the lovely custom relations
people that we have.
They are smart, empathetic people.
And we don't want them to have to spend their time sitting
and just reading through chats to categorize them.
We want them to be using their empathy to actually respond
to issues.
So how do we approach this problem?
Obviously, we want to use machine learning
because it's got sparkles and hearts and big companies use it.
It makes me feel warm and cuddly and smart.
Those are bad reasons to use machine learning.
Like I said before, you do sacrifice
a lot of interpretability.
So if I tell you that I want to use machine learning
with this problem, your first question to me might be,
couldn't we do something else?
And you can imagine things we could do.
We could write some simple rules of like,
we just have an if the subject contains lost card,
maybe this is something we want to treat urgently.
And you could imagine totally solving this and just
like having kind of a list of rules
that some human is coming up with.
And that would be a great first pass.
See if that works for you.
If that works, then please do that instead
of doing machine learning.
There are a couple of reasons why
you might want to do this.
If you try to do that, but there's still a whole lot
of cases that you aren't catching,
there's a good chance that machine learning is going
to allow you to get farther in terms of optimizing
the accuracy and scope of that.
It can get very hairy to maintain a rule set like that.
In our case, we have a data science department
that wants to own that, like figuring out
how do we categorize chats.
And we have engineering that is actually
taking those rules and putting them in direction.
And we've had a couple of cases where
we've tried to do some simple rules.
And it ended up being very difficult to keep
that communication going.
Do the data scientists go in and change the service code?
And machine learning, one of the nice things
is that you can create a model artifact that then is something
that's very easy for engineers to say,
then take that artifact and put it in.
And it's kind of this thing that data scientists
can create something that's plugged
into another service.
In our case, we get to use a bunch of natural language
processing techniques, which this
is a whole field of research.
And it's something that Sec It Learn, for example,
has great support for.
So one reason going down this path was nice
is that you get to piggyback on a lot of these things that
are already existing for the kind of problem
that we want to look at.
And management wanted us to use machine learning.
And that's something.
There is ability in that in that our data science organization
has greater ambitions as to the ways that they want to do this.
And we want to take a small, understandable question
to really prove what this whole workflow looks like.
So implementing this is hopefully
the first step towards having infrastructure
to be able to tackle other problems with machine learning
in the future and do exciting new things.
So if you are in my shoes and you
are an engineer who has never implemented a machine learning
problem, and you're talking to data scientists,
and they're like, we want to do this in Python, we think,
and we're using some Scikit-learn, what would you do?
You would go to Scikit-learn documentation,
and you would try to figure out what on earth is going on.
So I'm going to kind of walk through my journey
of understanding stuff.
So Scikit-learn's documentation, it's actually pretty great.
There's some good stuff in there.
I recommend you check it out.
So if you go to the front page, you
go up to the documentation tab, you'll get to the music guide.
And it's actually got this really fantastic first paragraph
section about what even is machine learning.
So I'm just going to give you a little bit of a highlight here.
It says, a machine learning problem
consists of a set of n samples of data, and then tries to,
sorry, it considers a set of n samples of data,
and then tries to predict properties of unknown data.
If each sample is a singular number,
and for instance, a multi-dimensional entry,
it is said to have several attributes or features.
So in our case, we have a data set with two features.
There's a body of the message, and there's
a subject on the message.
So we have two features to feed in,
and then we want to get some answer out of a classification.
There are generally two types of machine learning problems.
There's supervised learning, and there's unsupervised learning.
Unsupervised learning is in the case where
you don't have the answers already.
So if you didn't have them already classified,
this is things like clustering.
So you can imagine taking all of those chats
and saying, hey, algorithm, try to figure out
some things that look similar to each other.
Maybe that would be useful for even getting
a sense of what categories I want to have.
And you could take a look at those clusters.
What's in common with those?
Oh, it looks like lost card things are a category
that I might want to consider.
That's not what we're doing.
We're doing supervised learning, and in particular,
we're doing classification.
So we already have a data set where we have inputs,
and what should the classification be?
What we're going to learn based on that data set,
and then we are going to spit out
a model that's able to take in new inputs
and predict a classification for it.
OK, now we get to the fun stuff, the code.
This first slide kind of lays out the whole thing.
So be prepared.
We're jumping into the second line.
In particular, we are creating a pipeline.
So this is one of my favorite things about scikit-learn.
It has this whole API with lots of bits and pieces,
but it provides this nice kind of wrapper.
You can define a pipeline with a whole bunch of steps
of how do I want to process my data
and then finally apply a classifier to it.
And this ends up making our job of taking all of this stuff
and turning it into an artifact a lot easier on the line.
So this slide serves as an introduction to scikit-learn.
It serves as an introduction to network language processing.
So I'm going to go through each line a little bit here.
So creating a pipeline, first step in here,
we're giving it a name.
We're calling this step preprocess.
And then we're passing in this message preprocessor.
So message preprocessor and text processor, in our case,
these are two classes that we have defined,
but they inherit from this scikit-learn interface
of a transformer mixin.
So it gives you this way of you just instantiate a class that's
based on this mixin, and it becomes something
that you can throw into a pipeline.
So the message preprocessor, all of that is doing
is it's taking in some chunk of data that
has a subject and a body in it, and it's
turning it into a single string.
And it has the subject weight in here.
So the subject weight is simply include the subject
that many times.
So we found that the subject tends
to be more important than the stuff that's in the body.
So this gives us a way to kind of dial up
how important should the subject be.
So we just duplicate the subject a bunch of times,
append that to the body, and that's
the output of this preprocessor.
So now we're just down to a string.
Then we use this text processor.
It takes in stop words and a lemmatizer.
So stop words, this is a concept from natural language
processing.
These are things like and, as, but, just
like all of those words that don't really add information,
they're just kind of connecting together the sentence.
So this is something you might want to play around with.
We might want to include simple, simple.com, other things that
are specific to our domain that really aren't adding
information to the message.
And then a lemmatizer.
So this is another natural language processing term.
A lemma is the form of a word that you
would find in the dictionary.
So if you think of stopping, the word that you would look up
is stop.
So a lemmatizer is just a chunk of code
that will take in a bunch of words
and change stopping into stop, change going into go,
et cetera.
So this is some of those precanned techniques
that you can take advantage of.
You can find lots of documentation on that.
So this is cleaning it up.
Next step is account vectorizer and this term frequency
transformer.
These are just progressively going away from,
we had text at this point, and now we're
just kind of creating this mathematical glob that
is easier for an algorithm to understand.
So I'm not going to go into what those things are specifically.
And then finally, now that we have
this nice mathematical form of the data,
we've turned it into this new count of words
and it's transformed.
Then we throw it into this classifier.
If a gradient-boosted classifier,
don't ask me about the specifics of gradient-boosted classifiers.
One of the data scientists does this.
OK.
So this is the whole pipeline, all the steps that our data
is going to go through.
Yeah, question.
I will ask, but I think it's not in the Q&A.
Correct.
Great question.
So one of the great things about scikit-learn
is that it has become this cornerstone of if you want
to do machine learning in Python,
but it doesn't have to stand on its own.
And a lot of other packages provide wrappers
that make it easy to plug things into scikit-learn.
So yes, yeah, xgboost is a separate package,
but it provides a nice wrapper.
And you can just throw it into a scikit-learn pipeline like this.
And yeah, feel free to ask questions as we go through.
Any other questions?
Does it like to go into its own classifiers?
It does come with its own classifiers.
How do you pass in stuff from a pipeline object?
It won't pass, but it won't pass into a pipeline.
So yeah, this pipeline object gets,
this is literally a list that we're passing in here
to the pipeline object.
So it holds all this stuff.
And then you'll see in the next slide
how we pass data into this whole thing.
One last question here.
Quick question about scikit-learn.
Yep.
Is it based in part on the natural language toolkit,
or does it have its own implementation
of that kind of functionality?
Is natural language toolkit a specific library?
Oh, NLTK is that what it is?
We, I have no idea the overlap there.
I think that we are using NLTK in some of the stuff
that we're doing here.
I can't remember the specifics of whether,
what's built in and what's not.
All right.
So we have a pipeline.
Let's actually do something with it.
So the first thing that we have to do
is we have to train the model with existing data.
So this is going to end a little bit up
if you've never heard of pandas.
It's part of the Python ecosystem.
Quick introduction to Python ecosystem.
There's NumPy, which is a library that's
just efficient arrays and stuff, kind of like basic building
blocks.
There's SciPy, which is a bunch of algorithms
that are useful.
And pandas is kind of like data science glue stuff
that makes things convenient.
So pandas has this read SQL function.
You just pass it in a database connection and a query,
and it spits out something that's
called a pandas data frame.
And it's basically a big matrix of information
with a bunch of convenience functions.
So we're pulling all this data out of our data warehouse,
the category subject and body.
And then we're breaking it up into y
is the dependent variable.
This is the thing that we want to figure out,
which is the category.
And then x is the independent variable,
which is the subject and the body.
Anything about this line rub you wrong?
Anything that seems non-idiomatic about this line?
Somebody want to call it out?
This is the capital X over here.
Why on earth is this a capital X?
This should be a lowercase x because it's a variable in Python.
It's a capital X. This is a case of math idiom trumpeting
Python idiom.
So in linear algebra, in math in general, this is a matrix.
This is a single, this is a one dimensional thing.
This is a multi-dimensional thing.
So that's why you'll see these x's be capitalized.
So we break it up.
And then we break it up even more.
What on earth is going on here?
Train, test, split.
We are essentially saying we want
to reserve one third of the data for testing later.
And we're only going to trade this model on two thirds
of the data.
Why would you want to do that?
The more data that you have to train,
the more accurate your model can potentially be.
This I will explain in a little bit
why you want to reserve this.
But that's what we're doing.
We're reserving one third of the information for testing later.
Finally, we call pipeline.fit.
And we pass in all of this information,
this training data set in there.
So we're telling it that x stuff,
those independent variables, and we're also telling it
these are the answers that we've already created.
So once you've done that, you want to go back
and you want to use this test data to validate
how is it performing.
Why do you need to validate?
Because there's something called overfitting.
So imagine that these black points are some sort of data
that you're interested in.
You could fit that with this black line here,
a nice straight line.
You'll notice that doesn't pass through the points.
So there's some error there.
It's not perfectly modeling the data.
But generally, the black dots tend
to be in the lower left corner and they tend
to be in the upper right corner.
This black line is telling you something
about the underlying structure.
This blue line perfectly fits the data.
It hits every point.
But if I take some more observations,
do you think they are going to fall perfectly
on that blue line?
Probably not.
This blue line tells you absolutely nothing
about the underlying structure of the data.
So this is overfitting.
This blue line is like you have extracted too much out
of your training data set.
And it's gibberish.
It doesn't really tell you anything
about predicting future results.
So that's why you want your reserve sum.
So you can tell, have I overfit?
Would this be able to make reasonable predictions
for future inputs?
So testing the model looks like this.
We have our X test matrix here.
We're throwing it into this predict method of the pipeline.
So that spits out what the model says the answer should
be for this test data.
And then we're printing out this classification for it
that considers what the model predicted
and then what were the actual assignments that the humans
made to this.
It does some statistic-y things.
And then it spits out, this is generally
how you are performing.
We aren't going to go into the specifics of this.
This is bogus.
And doesn't that actually reflect the data
that was thrown in there?
But that's the kind of thing you can get out.
The score is basically telling you, if this were a one,
it means you always got it.
Your model's doing great.
If this is low, it's giving you a sense of what maximum
performance you can expect for future inputs.
So we've actually trained the model.
We've put some information through the pipeline.
So we have a model that we can use.
And now, in order to put that into some context where
it can be used in the future, we want to actually wrap that
in a Flask service.
So this is what it looks like to actually be
able to spit out classifications in an API using Flask.
So we are defining a route.
So we're going to have a service that you can hit a URL that
looks like this.
This is a messages endpoint.
You post JSON data here.
You pull messages out of that JSON.
Then we're calling the pipeline.predict method again.
This is the same thing that we called in testing.
And then we're just massaging the data out
of those predictions and JSONifying it again
for something that we can spit back to the user that
requested it.
Can you pick up the model?
We'll get to that.
We'll get to that.
Asking about pickling and persisting model.
OK, so live demo time.
This is exciting because it means
that I can mess up all sorts of things.
So over on the right here, we're going to actually start up
this Flask Gap.
It's bundled up in this bin run executable that we created.
So it's now running.
We have a Flask Gap going.
And you'll see this spit out log messages as we make requests.
So let's try to hit this API.
We get 405.
That's because we said that this should be a post.
So we just made a get request.
It was 405 because we didn't tell it
what to do with the get request.
So let's make this a post instead.
So dash x post.
I'm using curl, which is the command line thing
for making HTTP requests.
We've got 400 this time.
Getting a little bit better.
We're making a post request now.
But we aren't actually posting any data.
So we need to actually give it what data are we trying to post.
So let's actually put in a whole message here.
So you can just ignore these two IDs.
This is just for tracking so that when you're making a request,
you can understand what is the message that I'm getting back.
But we're putting in a body and a subject here.
And this thing is actually running.
And we're going to see what classification it
makes for this message.
Do you like that class label?
A correct class label is urgent.
Good job.
So let's make another request here.
Let's say instead of this, I am interested in some new feature.
So let's say joint accounts.
I want to ask about joint accounts.
And I'm going to put in my body here.
I want to share money with my bae.
All right, that ended up being customer education.
I think that I, yeah.
If you do a slightly different thing, I want to share the ShareCAN
account.
Yeah, OK.
I got this to say new product earlier.
So you can tell here this is not perfect.
There was not material difference between those two.
So it identified.
So the shared accounts were a big news thing
that we launched in January.
We didn't have joint accounts for a long time.
So people were very excited about this.
But that was a new product.
All right, anybody want to volunteer information
for one last request?
And we'll see what it does.
I'm curious if you, on your first one,
if you change lost to found, what it does.
That's a good question.
Let's see, I found my card.
Yeah, I'm just building the lost card thing up there.
I found.
Yeah, it might still.
The ASAP part may mark this as urgent.
So interesting.
Do you have any ex-colleagues to mark this feeling?
They may.
I don't remember whether we have.
I don't remember whether we strip out punctuation.
We may strip out punctuation.
But I'm looking for a volunteer to put something else
into this before we move on.
What kind of card protection do you offer?
OK, what kind of card protection?
I'll say protection, protection.
What kind do you offer?
What kind do you offer, kind people?
Yeah, customer education.
A lot of these end up getting labeled
with customer education, which I think
shows that we have a lot of things that
tag customer education in our input.
So I am running lower on time.
So I am going to keep on chugging.
And I'm going to go back to the slides.
So great, we have a toy.
All right, all right.
So we have a nice toy.
It doesn't work perfectly.
But it's better than not having any classification at all
at this point.
So we put it in a web app kind of shape.
How do we actually take this to production
and allow our data scientists to do their thing
and create models and allow this to actually get
into this application?
This was supposed to be, oh yeah, OK.
I was supposed to say something about the training
wheels coming off.
Anyways, so some things that you might want to consider.
I'm going to talk about three steps of stuff
that we did to turn this into something more sustainable.
So the first step of real life here
is separating training, training the model,
and actually serving the model.
Again, this is kind of like data scientists,
mind control, training the model.
Engineers care about serving the model.
The one is kind of a better oriented thing.
The other thing is something we want
to serve in real time in a service.
So to do that, we need to persist the model somewhere
and have it someplace where the service can pick it up.
And somebody asked the question of,
did you use Pickle to do this?
Unfortunately, yes.
Yes, we did.
And why did we do that?
Because if you look in the documentation for Scikit-Learn,
it has a section about model persistence.
And it says, hey, there's built-in support for Pickle.
It has some issues.
If you don't want to use Pickle, you're on your own.
And that is pretty much.
This is one of the hairiest things, actually.
And this is something that I'll talk a little bit later
about what some larger companies do to solve this problem.
But we are using Pickle.
Issues with Pickle, if you aren't familiar with.
So Pickle is a built-in Python library
for taking a Python object and serializing it into something
you put in the file.
Issues with it are, number one, that it's
like it's code.
And if you unpickle something, it
gives a chance to just run arbitrary code.
So you definitely don't.
If your friend says, hey, I found this file.
You want to unpickle it?
Say no.
Don't do it.
So you want to trust where you get your Pickle from before you
unpickle it.
This is in life.
This is in life.
Whew.
Well, this got exciting.
So in our case, we're using Pickle.
And the place that we're doing this,
we're all in Amazon Web Services.
So S3 is the object store in AWS.
So we're creating an S3 client.
KMS is Amazon's key management service.
So what we're doing here is we're dumping our pipeline
object to a Pickle file.
We're then encrypting that Pickle file.
And we're putting that object to S3.
Why are we encrypting it?
Again, we're using this key management thing.
This is to guarantee that it came from this service that
has access to the key.
Like, whoever created this Pickle had access to the key.
And hopefully, it wasn't some rogue engineer that
has access to many things.
Hopefully, it was actually the service.
Yep.
So that's what it looks like for us to turn this thing
into a Pickle and put it in S3.
And then our Flask app pulls the thing up
when we start the app.
That's what we decided.
So we have no way of automatically doing that.
If we wanted to change which model we're using,
we just change the config file.
And we restart the service in order
to pull in that new updated model with more training
data or with updated logic.
And that's one of the really nice things about the Cyclone
pipeline is it's packaging together
all the logic of the pipeline along with the trained data.
So you pull this out, you unencrypted it,
and then you have it available to your app to use.
Step two is providing an environment
for doing this batch training and evaluation.
I'm going to read through this quickly.
This is the actual training of the model.
Data scientists could do that on their laptop.
In our case, it's not super computationally expensive.
So you could just do that, create your model there,
and then have the data scientist put in S3.
We want some more guardrail down the net,
because what if they pulled in some random version of something
or whatever?
So when we train the model, we want
to know what version of the code is used, what
are all the libraries, really be able to reproduce
this model if it came to it.
You also might want to do really computationally expensive
things when you're training the model.
So something that we could do is this grid search thing.
What this whole blob is is just saying,
hey, our pipeline have all these parameters we're passing in?
We don't know if those are the best parameters.
Let's try out a whole bunch of different options
and a bunch of different combinations.
And 5 times 3 times 3 times 2 times 2 times 2,
this is already 360 different combinations.
Let's just run this whole process
to fit the model on each one of those combinations.
And that's where you start being like, well,
it would be nice to have an environment
to just do all of that stuff and not do it on my laptop.
So that's something that you might want to consider.
And step 3 is when you actually have your app running,
how do you know if it's doing reasonable things?
So you want to do all the things that you
would do with any other web service.
You want to monitor the performance.
You have to figure out how you adapt to your production load
if there are other spikes.
And when this happens, do we get pretty average
consistent from the model, pretty average performance
from the model, or does that vary a lot?
How do you degrade gracefully if something
goes terribly wrong?
And other applications that are now
you're relying on getting these predictions back
don't get predictions back.
And as I said at the beginning, I
don't have a whole lot to say on this subject
yet because we have not actually started relying on this yet.
So we have a toy.
We know that the general approach works.
But we shall see how it goes when we actually
get this into production reliant.
So last thing I want to touch on is
what are some other approaches if you don't want to put
together your own batch process, figure out
how you're going to do all the persistence and all of that.
So considerations is how big is your team?
For us, we have a team of three data engineers right now.
We have a team of half a dozen data scientists.
But all of us were working on this part time
and took us three months to build what we built.
A lot of companies that really care about machine learning
have a whole team of machine learning engineers and stuff.
And this starts to look very different in that case.
If you're really big and you have lots of engineering
resources to throw at it and you care a whole lot
about the performance of the model, what a lot of people
will do is sometimes actually train the model in scikit-learn
or in Python.
Data scientists are familiar with that.
They can create some artifact.
But you choose some custom serialization format
like you dump it out to some JSON with the parameters.
And then you might have an application written in Java
or Scala or something else to actually run the model
and get better performance that way.
And you've got to define your own interface to get
from one language to the other.
And that takes a whole lot more effort
than what we threw at this.
So how large of a problem space do you need to cover?
If you do that sort of thing that I just
said of like defining your own custom serialization
format between one and the other,
you have to do that for every different kind of model
that you want.
So this is something where right now we
don't know what all kinds of models we might want to use.
So we wanted to focus on, let's try to do something
that we could adapt to, like other stuff in scikit-learn.
And we might want to spin off some completely different
model.
So it's kind of nice that right now we
could do something that is significantly
different from this natural language processing problem
and throw it in there.
So those are some of the considerations
you might think about.
There are also companies that would
want to take your money.
I don't know how legit any of these are.
These are some of the ones that I found.
Y hat has a science ops platform.
There's Anaconda Enterprise.
There's Domino.
They all have these nice shiny marketing materials.
And they claim that they're going
to solve your communication issues.
And everybody is going to be able to live
in their happy land.
And there's some variation of like,
we'll provide infrastructure for you to run this stuff.
Your data scientists can click on buttons
and decide what version of a model is being promoted.
So it solves a bunch of those problems.
We didn't go too far down the path of understanding that.
All right, I am going to finish up here.
If you want to train test in batch environments,
even if you want to serve your results in real time,
you have this nice concept of serializing something
and having this artifact of your trained model
that you can then use in a different context.
And socket-learners pipeline module
is really helpful for making that happen.
And then actually serving the model in a real time context.
Once you figure the rest of this out,
this looks a whole lot like just serving
any other application.
And likely the details of your own environment apply to that.
So thank you.
And we have approximately eight minutes for questions.
I am on Twitter at Jeff Lucas.
I have already posted these slides.
So you can see them on Twitter.
I will try to remember to post them in Slack
or if somebody would like to go ahead and post
that link in Slack.
That would be good.
What questions do you have?
We have a question here.
Once it is in production, do you have any way
of seeing from the results of learning
from the results of the thing?
So that is, say, 95% of the queries
are classified as customer education.
You can think that you might be able to detect
that there might be an issue with your training materials
and so forth.
Yeah, our plan is to, first of all,
we're going to be logging all these results to a database.
OK, yeah, I will repeat the question.
So the question was, how will we know,
as we're using this in production
and we get answers out of it, how are we
going to know how well we're doing?
And if it's way overclassifying things to one thing,
how are we going to react to that?
And we are planning to persist all of these classifications
that happen.
Do a database so it'll be kind of a flexible data
set that we can look at.
Also, we're expecting if we classify things wrong,
humans will still end up reclassifying them.
So that'll be a very interesting thing to look at,
is how many of these things get reclassified.
And then it's to be seen how much of that
we feed back into the model and what exactly parameters
we use to filter out what we drain
into new versions of the model.
So as you're prototyping and running it,
does it take a lot of resources at that stage?
So does this take a lot of resources
while you're prototyping it?
That probably depends on what you're doing.
But in our case, it's the query to the database
that's the extensive thing.
So I pull that in, and then I can pretty quickly
training the model.
In this case, it takes a second or two.
But again, that very much depends on what algorithm
you're using.
They're all expensive classifiers and processing
that you can do on data.
So you mentioned that engineers need to end up serving you up.
Have you found good efficiencies or good optimizations?
Because you just thought with a class gap,
more likely different engineering groups
are going to have their own different shipping codes
and stuff.
Have you found good optimizations for that?
Or are they just a problem?
So the question is, what optimizations have we found?
We're running a class gap.
How does that perform?
And we don't know at this point.
We expect it to be a pretty low volume data set at this point.
So we're not expecting to see performance issues.
But it will be very interesting to see
how we're able to handle it.
Anyway, we tried putting a pickle file in the repo of the API
for GitHub.
So if you're using Docker or Kubernetes or something,
and you're spinning that image on every Google Quest
master, you have your pickle file in the repo.
It creates the image.
And then you use Kubernetes, and you have all your pods.
And you basically are scaling up to the sense
that your users are going to get different pods of the model
as of morning, which kind of a hand wave.
I'm just curious about other implementations of that.
Yeah, there's not much we've learned yet in that area.
And in your example, how many messages
were in your training set?
How many messages were in our training set?
I should have an answer for that.
Something on the order of 100,000, I think.
Over here.
If you're a data scientist, what if you're using a different tool
like R or something, would that approach
be something you could modify that would accommodate that?
So yeah, if our data scientists were using something like R,
would we be able to modify this approach?
We kind of decided to go all in on Python.
Most of our web services we create are in Scala.
Spinning up a Flask Gap in Python is weird for us.
So that's somewhat uncharted territory.
But we decided that's worth doing because it unlocks
the whole Python data ecosystem.
And we can just train the model in Python
and then serve it in Python.
And that gives us a lot of flexibility for right now.
So the search answer is we would also
have to figure out how to serve an R model if we wanted to do
that.
And so that is not something we're planning to do.
Yep.
I was wondering if you had any difficulties building
a batch of training in Python and also the environment
that you could actually get to the Python
and just use the Python environment.
So OK.
We have trouble building a batch training environment
that we can actually get buy-in from.
So we already happened to have basically a scheduled task
service running that happens to be in Python.
It's using Celery to basically be a really dumb and expensive
cron solution.
So that was something that was already existing.
We're like, this is Python.
Can throw Python code in here.
So that's what we're using.
And it was convenient for our particular situation.
And our data scientists were already
used to creating jobs in there.
I heard the term re-enforcing training.
When you do a lot of production, you
can try to do routine reclassifying.
Does that get you the realm of re-enforcing training?
So yeah.
The question is, when we have humans going in and reclassifying
things, does that get into the realm of reinforcement
learning if we're throwing that back into the model?
I am not super familiar with the term reinforcement learning.
So I am not going to make a comment on that,
because I don't want to spread misinformation.
But likely, likely yes.
So do you have an architected model
to help you think of things like,
if you think, please help me process versions or use
some obvious rules?
Have you seen examples of where you
want to use machine learning to go past those obvious rules
and find the exceptions?
Like someone said, please help me learn about this new product.
Have you seen examples of when it manages
to find those exceptions?
Yeah.
So the question is about, have you
seen examples of where you have a concept of,
please help, and is it able to find variants of that
that a human wasn't anticipating?
Not off the top of my head, but if you play around with this,
yes, that idea of stemming words, like lemmatizing,
it works pretty well.
So yeah, it's something worth playing around with.
All right.
And we'll take one more question.
I know you mentioned this in the beginning of the talk,
so I apologize if you have to reiterate.
What is the goal, I guess, of all this?
So you already have human beings classifying information.
What are you hoping to get out of having a machine do that?
Yeah, so we already have a process for humans
classifying these chats.
What's the benefit of having a machine do this?
So it's two things.
First of all, it's getting that workload off of a human.
So it's a human that can go do something else.
I'm sorry.
The workload, though, is just picking it from a dropdown
button, right?
Yeah, so yeah, the workload is reading through the message,
understanding what it is, picking some category
from a dropdown, and going through that.
So it removes a human having to be dedicated to that task
and being able to do what they are more talented at doing.
It also reduces the amount of time
to get to that classification.
So even if you have a driver system set up,
maybe somebody is going in once an hour
and you're working down the list of messages.
And as soon as the message comes in,
within seconds, it can have that first step of classification.
So then your people that are assigned
to working urgent tickets, those are immediately
available in that queue.
So we're hoping that it's going to bring down the mean time
to resolution on those urgent issues.
Thank you, everybody.
Happy day.